Next: Recognize Coding, Previous: Select Input Method, Up: International [Contents][Index]
Users of various languages have established many more-or-less standard coding systems for representing them. Emacs does not use these coding systems internally; instead, it converts from various coding systems to its own system when reading data, and converts the internal coding system to other coding systems when writing data. Conversion is possible in reading or writing files, in sending or receiving from the terminal, and in exchanging data with subprocesses.
Emacs assigns a name to each coding system. Most coding
systems are used for one language, and the name of the coding
system starts with the language name. Some coding systems are
used for several languages; their names usually start with
‘iso’. There are also special coding
systems, such as no-conversion,
raw-text, and
emacs-internal.
A special class of coding systems, collectively known as
codepages, is designed to support text encoded by
MS-Windows and MS-DOS software. The names of these coding systems
are cpnnnn, where nnnn is a 3-
or 4-digit number of the codepage. You can use these encodings
just like any other coding system; for example, to visit a file
encoded in codepage 850, type C-x RET c cp850 RET C-x C-f
filename RET.
In addition to converting various representations of non-ASCII characters, a coding system can perform end-of-line conversion. Emacs handles three different conventions for how to separate lines in a file: newline (Unix), carriage-return linefeed (DOS), and just carriage-return (Mac).
Describe coding system coding
(describe-coding-system).
Describe the coding systems currently in use.
Display a list of all the supported coding systems.
The command C-h C
(describe-coding-system) displays information about
particular coding systems, including the end-of-line conversion
specified by those coding systems. You can specify a coding
system name as the argument; alternatively, with an empty
argument, it describes the coding systems currently selected for
various purposes, both in the current buffer and as the defaults,
and the priority list for recognizing coding systems (see
Recognize
Coding).
To display a list of all the supported coding systems, type M-x list-coding-systems. The list gives information about each coding system, including the letter that stands for it in the mode line (see Mode Line).
Each of the coding systems that appear in this
list—except for no-conversion, which means no
conversion of any kind—specifies how and whether to convert
printing characters, but leaves the choice of end-of-line
conversion to be decided based on the contents of each file. For
example, if the file appears to use the sequence carriage-return
linefeed to separate lines, DOS end-of-line conversion will be
used.
Each of the listed coding systems has three variants, which specify exactly what to do for end-of-line conversion:
…-unixDon’t do any end-of-line conversion; assume the file uses newline to separate lines. (This is the convention normally used on Unix and GNU systems, and Mac OS X.)
…-dosAssume the file uses carriage-return linefeed to separate lines, and do the appropriate conversion. (This is the convention normally used on Microsoft systems.8)
…-macAssume the file uses carriage-return to separate lines, and do the appropriate conversion. (This was the convention used on the Macintosh system prior to OS X.)
These variant coding systems are omitted from the
list-coding-systems display for brevity, since they
are entirely predictable. For example, the coding system
iso-latin-1 has variants
iso-latin-1-unix, iso-latin-1-dos and
iso-latin-1-mac.
The coding systems unix, dos, and
mac are aliases for undecided-unix,
undecided-dos, and undecided-mac,
respectively. These coding systems specify only the end-of-line
conversion, and leave the character code conversion to be deduced
from the text itself.
The coding system raw-text is good for a file
which is mainly ASCII text, but may contain
byte values above 127 that are not meant to encode
non-ASCII characters. With
raw-text, Emacs copies those byte values unchanged,
and sets enable-multibyte-characters to
nil in the current buffer so that they will be
interpreted properly. raw-text handles end-of-line
conversion in the usual way, based on the data encountered, and
has the usual three variants to specify the kind of end-of-line
conversion to use.
In contrast, the coding system no-conversion
specifies no character code conversion at all—none for
non-ASCII byte values and none for end of
line. This is useful for reading or writing binary files, tar
files, and other files that must be examined verbatim. It, too,
sets enable-multibyte-characters to
nil.
The easiest way to edit a file with no conversion of any kind
is with the M-x find-file-literally command. This uses
no-conversion, and also suppresses other Emacs
features that might convert the file contents before you see
them. See Visiting.
The coding system emacs-internal (or
utf-8-emacs, which is equivalent) means that the
file contains non-ASCII characters stored with
the internal Emacs encoding. This coding system handles
end-of-line conversion based on the data encountered, and has the
usual three variants to specify the kind of end-of-line
conversion.
It is also specified for MIME ‘text/*’ bodies and in other network transport contexts. It is different from the SGML reference syntax record-start/record-end format, which Emacs doesn’t support directly.
Next: Recognize Coding, Previous: Select Input Method, Up: International [Contents][Index]